Using Media-Pipe for full-body tracking, masking, blurring, and movement tracing¶


Wim Pouw (wim.pouw@donders.ru.nl) & Sho Akamine (Sho.Akamine@mpi.nl)

isolated isolated

Info documents¶

This python notebook runs you through the procedure of taking videos as inputs with a single person in the video, and outputting outputs of the kinematic timeseries, and optionally masking, blurring, and adding hand movement traces to videos with facial, hand, and arm skeletons.

The current code is derived and modified from the masked-piper tool, which is a simple but effective modification of the the Holistic Tracking by Google's Mediapipe so that we can use it as a CPU-based light weigth tool to mask your video data while maintaining background information, and also preserving information about body kinematics.

  • location Repository: https://github.com/WimPouw/envisionBOX_modulesWP/tree/main/Mediapipe_Optional_Masking

  • location Jupyter notebook: https://github.com/WimPouw/envisionBOX_modulesWP/blob/main/MultimodalMerging/Masking_Mediapiping.ipynb

Current Github: https://github.com/WimPouw/TowardsMultimodalOpenScience

Version¶

2.0.0 (we added some functionalities such as blurring and movement tracing, and fixed some bugs with non-present right/left body parts)

Additional information backbone of the tool (Mediapipe Holistic Tracking)¶

https://google.github.io/mediapipe/solutions/holistic.html

Citation of mediapipe¶

citation: Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.

Citation of masked piper¶

  • citation: Owoyele, B., Trujillo, J., De Melo, G., & Pouw, W. (2022). Masked-Piper: Masking personal identities in visual recordings while preserving multimodal information. SoftwareX, 20, 101236.
  • Original Repo: https://github.com/WimPouw/TowardsMultimodalOpenScience

Citation of this code¶

  • citation: Pouw, W., & Akamine, S. (2025). Using Media-Pipe for full-body tracking, masking, blurring, and movement tracing. Retrieved from https://github.com/WimPouw/envisionBOX_modulesWP/tree/main/Mediapipe_Optional_Masking

Use¶

Make sure to install all the packages in requirements.txt. Then move your videos that you want to mask into the input folder. Then run this code, which will loop through all the videos contained in the input folder; and saves all the results in the output folders.

Setting up packages and folder structure¶

In [1]:
#load in required packages
import mediapipe as mp #mediapipe
import cv2 #opencv
import math #basic operations
import numpy as np #basic operations
import pandas as pd #data wrangling
import csv #csv saving
import os #some basic functions for inspecting folder structure etc.

#list all videos in input_videofolder
from os import listdir
from os.path import isfile, join
mypath = "./Input_Videos/" #this is your folder with (all) your video(s)
vfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))] #loop through the filenames and collect them in a list
#time series output folder
inputfol = "./Input_Videos/"
outputf_mask = "./Output_Videos/"
outtputf_ts = "./Output_TimeSeries/"

#check videos to be processed
print("The following folder is set as the output folder where all the pose time series are stored")
print(os.path.abspath(outtputf_ts))
print("\n The following folder is set as the output folder for saving the masked videos ")
print(os.path.abspath(outputf_mask))
print("\n The following video(s) will be processed for masking: ")
print(vfiles)
The following folder is set as the output folder where all the pose time series are stored
d:\Research_projects\envisionBOX_modulesWP\Mediapipe_Optional_Masking\Output_TimeSeries

 The following folder is set as the output folder for saving the masked videos 
d:\Research_projects\envisionBOX_modulesWP\Mediapipe_Optional_Masking\Output_Videos

 The following video(s) will be processed for masking: 
['ted_kid.mp4']

Initializing marker names and key methods from mediapipe¶

In [2]:
#initialize modules and functions

#load in mediapipe modules
mp_holistic = mp.solutions.holistic
# Import drawing_utils and drawing_styles.
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

##################FUNCTIONS AND OTHER VARIABLES
#landmarks 33x that are used by Mediapipe (Blazepose)
markersbody = ['NOSE', 'LEFT_EYE_INNER', 'LEFT_EYE', 'LEFT_EYE_OUTER', 'RIGHT_EYE_OUTER', 'RIGHT_EYE', 'RIGHT_EYE_OUTER',
          'LEFT_EAR', 'RIGHT_EAR', 'MOUTH_LEFT', 'MOUTH_RIGHT', 'LEFT_SHOULDER', 'RIGHT_SHOULDER', 'LEFT_ELBOW', 
          'RIGHT_ELBOW', 'LEFT_WRIST', 'RIGHT_WRIST', 'LEFT_PINKY', 'RIGHT_PINKY', 'LEFT_INDEX', 'RIGHT_INDEX',
          'LEFT_THUMB', 'RIGHT_THUMB', 'LEFT_HIP', 'RIGHT_HIP', 'LEFT_KNEE', 'RIGHT_KNEE', 'LEFT_ANKLE', 'RIGHT_ANKLE',
          'LEFT_HEEL', 'RIGHT_HEEL', 'LEFT_FOOT_INDEX', 'RIGHT_FOOT_INDEX']

markershands = ['LEFT_WRIST', 'LEFT_THUMB_CMC', 'LEFT_THUMB_MCP', 'LEFT_THUMB_IP', 'LEFT_THUMB_TIP', 'LEFT_INDEX_FINGER_MCP',
              'LEFT_INDEX_FINGER_PIP', 'LEFT_INDEX_FINGER_DIP', 'LEFT_INDEX_FINGER_TIP', 'LEFT_MIDDLE_FINGER_MCP', 
               'LEFT_MIDDLE_FINGER_PIP', 'LEFT_MIDDLE_FINGER_DIP', 'LEFT_MIDDLE_FINGER_TIP', 'LEFT_RING_FINGER_MCP', 
               'LEFT_RING_FINGER_PIP', 'LEFT_RING_FINGER_DIP', 'LEFT_RING_FINGER_TIP', 'LEFT_PINKY_FINGER_MCP', 
               'LEFT_PINKY_FINGER_PIP', 'LEFT_PINKY_FINGER_DIP', 'LEFT_PINKY_FINGER_TIP',
              'RIGHT_WRIST', 'RIGHT_THUMB_CMC', 'RIGHT_THUMB_MCP', 'RIGHT_THUMB_IP', 'RIGHT_THUMB_TIP', 'RIGHT_INDEX_FINGER_MCP',
              'RIGHT_INDEX_FINGER_PIP', 'RIGHT_INDEX_FINGER_DIP', 'RIGHT_INDEX_FINGER_TIP', 'RIGHT_MIDDLE_FINGER_MCP', 
               'RIGHT_MIDDLE_FINGER_PIP', 'RIGHT_MIDDLE_FINGER_DIP', 'RIGHT_MIDDLE_FINGER_TIP', 'RIGHT_RING_FINGER_MCP', 
               'RIGHT_RING_FINGER_PIP', 'RIGHT_RING_FINGER_DIP', 'RIGHT_RING_FINGER_TIP', 'RIGHT_PINKY_FINGER_MCP', 
               'RIGHT_PINKY_FINGER_PIP', 'RIGHT_PINKY_FINGER_DIP', 'RIGHT_PINKY_FINGER_TIP']
facemarks = [str(x) for x in range(478)] #there are 478 points for the face mesh (see google holistic face mesh info for landmarks)

print("Note that we have the following number of pose keypoints for markers body")
print(len(markersbody))

print("\n Note that we have the following number of pose keypoints for markers hands")
print(len(markershands))

print("\n Note that we have the following number of pose keypoints for markers face")
print(len(facemarks ))

#set up the column names and objects for the time series data (add time as the first variable)
markerxyzbody = ['time']
markerxyzhands = ['time']
markerxyzface = ['time']

for mark in markersbody:
    for pos in ['X', 'Y', 'Z', 'visibility']: #for markers of the body you also have a visibility reliability score
        nm = pos + "_" + mark
        markerxyzbody.append(nm)
for mark in markershands:
    for pos in ['X', 'Y', 'Z']:
        nm = pos + "_" + mark
        markerxyzhands.append(nm)
for mark in facemarks:
    for pos in ['X', 'Y', 'Z']:
        nm = pos + "_" + mark
        markerxyzface.append(nm)

#check if there are numbers in a string
def num_there(s):
    return any(i.isdigit() for i in s)

#take some google classification object and convert it into a string
def makegoginto_str(gogobj):
    gogobj = str(gogobj).strip("[]")
    gogobj = gogobj.split("\n")
    return(gogobj[:-1]) #ignore last element as this has nothing

#make the stringifyd position traces into clean numerical values
def listpostions(newsamplemarks):
    newsamplemarks = makegoginto_str(newsamplemarks)
    tracking_p = []
    for value in newsamplemarks:
        if num_there(value):
            stripped = value.split(':', 1)[1]
            stripped = stripped.strip() #remove spaces in the string if present
            tracking_p.append(stripped) #add to this list  
    return(tracking_p)
Note that we have the following number of pose keypoints for markers body
33

 Note that we have the following number of pose keypoints for markers hands
42

 Note that we have the following number of pose keypoints for markers face
478

Main procedure Masked-Piper¶

The following chunk of code loops through all the videos you have loaded into the input folder, then assess each frame for body poses, extract kinematic info, masks, blurs the body or face in a new frame that keeps (or removes) the background, projects the kinematic info on the mask, and stores the kinematic info for that frame into the time series .csv for the hand + body + face.

We advise you to play around with the settings to see what animations you like best. The default setting now is full body blur with full body skeleton imposed on the image. We also enable tracing of the index fingers for motion.

In [ ]:
# MASKING AND BLURRING OPTIONS 
skeleton = True
skeleton_face_only = False         # Only show face skeleton (no body, no hands)
whitebackground = False            # Grey background with skeleton only (no body, no face)
maskingbody = False                # Masks the body silhouette with fully black color (original masked-piper approach)
maskingface = False                # Masks the face (fully black color)
blurringface = False               # Blurs the face
blurringbody = True                # Blurs the body region
blurringfactor = 1                 # Blurring intensity (0-1, 1 is full blur)
# TRACE OPTIONS
add_finger_traces = True           # Add fading traces for index fingers
trace_length_seconds = 1.5         # Length of trace in seconds
trace_color_left = (0, 255, 0)     # Green for left index finger trace
trace_color_right = (0, 0, 255)    # Blue for right index finger trace

# Process videos
# MASKING AND BLURRING OPTIONS 
skeleton = True
skeleton_face_only = False         # Only show face skeleton (no body, no hands)
whitebackground = False            # Grey background with skeleton only (no body, no face)
maskingbody = False                # Masks the body silhouette with fully black color (original masked-piper approach)
maskingface = False                # Masks the face (fully black color)
blurringface = False               # Blurs the face
blurringbody = True                # Blurs the body region
blurringfactor = 1                 # Blurring intensity (0-1, 1 is full blur)
# TRACE OPTIONS
add_finger_traces = True           # Add fading traces for index fingers
trace_length_seconds = 2           # Length of trace in seconds
trace_color_left = (0, 255, 0)     # Green for left index finger trace
trace_color_right = (0, 0, 255)    # Blue for right index finger trace

# Process videos
for vidf in vfiles:
    print(f"Processing video: {vidf}")
    print(f"Video {vfiles.index(vidf)+1} of {len(vfiles)}")
    
    videoname = vidf
    videoloc = inputfol + videoname
    capture = cv2.VideoCapture(videoloc)
    frameWidth = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
    frameHeight = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)
    samplerate = capture.get(cv2.CAP_PROP_FPS)
    
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    out = cv2.VideoWriter(outputf_mask + videoname, fourcc, 
                          fps=samplerate, frameSize=(int(frameWidth), int(frameHeight)))
    
    time = 0
    tsbody = [markerxyzbody]
    tshands = [markerxyzhands]
    tsface = [markerxyzface]
    
    # Initialize trace buffers for index fingers
    if add_finger_traces:
        trace_frames = int(trace_length_seconds * samplerate)
        left_finger_trace = []  # Store (x, y) positions for left index finger
        right_finger_trace = []  # Store (x, y) positions for right index finger
    
    with mp_holistic.Holistic(static_image_mode=False, enable_segmentation=True, refine_face_landmarks=True) as holistic:
        while True:
            ret, image = capture.read()
            if ret == True:
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                results = holistic.process(image)
                h, w, c = image.shape
                
                if np.all(results.face_landmarks) != None:
                    # Apply white background modes first (they override other options)
                    if whitebackground:
                        # Create white background with only landmarks
                        white_image = np.full((h, w, 3), (255, 255, 255), dtype=np.uint8)
                        original_image = cv2.cvtColor(white_image, cv2.COLOR_RGB2BGR)
                    else:
                        # Convert to BGR for further processing
                        original_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                    
                    # Apply body masking if enabled (makes body fully black - original masked-piper)
                    if maskingbody and not whitebackground:
                        # Original masking logic
                        image_with_alpha = np.concatenate([image, np.full((h, w, 1), 255, dtype=np.uint8)], axis=-1)
                        mask_img = np.zeros_like(image, dtype=np.uint8)
                        mask_img[:, :] = (255,255,255)
                        segm_2class = 0.2 + 0.8 * results.segmentation_mask
                        segm_2class = np.repeat(segm_2class[..., np.newaxis], 3, axis=2)
                        annotated_image = mask_img * segm_2class * (1 - segm_2class)
                        mask = np.concatenate([annotated_image, np.full((h, w, 1), 255, dtype=np.uint8)], axis=-1)
                        image_with_alpha[mask==0]=0
                        original_image = cv2.cvtColor(image_with_alpha, cv2.COLOR_RGB2BGR)
                    
                    # Apply blurring to body if enabled
                    if blurringbody and not whitebackground:
                        # Get body mask
                        segm_2class = 0.2 + 0.8 * results.segmentation_mask
                        body_mask = segm_2class
                        
                        kernel_size = int(51 * blurringfactor)
                        if kernel_size % 2 == 0:
                            kernel_size += 1
                        blurred_image = cv2.GaussianBlur(original_image, (kernel_size, kernel_size), 0)
                        
                        # Apply blur only to body region
                        body_mask_3channel = cv2.merge([body_mask] * 3)
                        original_image = (original_image * (1 - body_mask_3channel * blurringfactor) + 
                                        blurred_image * (body_mask_3channel * blurringfactor)).astype(np.uint8)
                    
                    # Apply face blurring if enabled
                    if blurringface and results.face_landmarks and not whitebackground:
                        face_mask = np.zeros((h, w), dtype=np.uint8)
                        landmarks = results.face_landmarks.landmark
                        
                        # Get all face points for a full face mask
                        face_points = np.array([(int(landmarks[i].x * w), int(landmarks[i].y * h)) 
                                              for i in range(len(landmarks))], dtype=np.int32)
                        
                        # Create convex hull of face points
                        hull = cv2.convexHull(face_points)
                        cv2.fillConvexPoly(face_mask, hull, 255)
                        
                        # Blur the face region
                        kernel_size = int(51 * blurringfactor)
                        if kernel_size % 2 == 0:
                            kernel_size += 1
                        blurred_image = cv2.GaussianBlur(original_image, (kernel_size, kernel_size), 0)
                        
                        # Apply blur only to face region
                        face_mask_3channel = cv2.cvtColor(face_mask, cv2.COLOR_GRAY2BGR) / 255.0
                        original_image = (original_image * (1 - face_mask_3channel * blurringfactor) + 
                                        blurred_image * (face_mask_3channel * blurringfactor)).astype(np.uint8)
                    
                    # Apply face masking if enabled (makes face fully black)
                    if maskingface and results.face_landmarks and not whitebackground:
                        face_mask = np.zeros((h, w), dtype=np.uint8)
                        landmarks = results.face_landmarks.landmark
                        
                        # Create mask for entire face
                        face_points = np.array([(int(landmarks[i].x * w), int(landmarks[i].y * h)) 
                                              for i in range(len(landmarks))], dtype=np.int32)
                        
                        # Create convex hull of face points for masking
                        hull = cv2.convexHull(face_points)
                        cv2.fillConvexPoly(face_mask, hull, 255)
                        
                        # Apply masking - make face fully black
                        original_image[face_mask > 0] = (0, 0, 0)  # Make face fully black
                    
                    # Add index finger traces if enabled
                    if add_finger_traces:
                        # Get current index finger positions
                        left_finger_pos = None
                        right_finger_pos = None
                        
                        if results.left_hand_landmarks:
                            # Index finger tip landmark index is 8
                            left_landmark = results.left_hand_landmarks.landmark[8]
                            left_finger_pos = (int(left_landmark.x * w), int(left_landmark.y * h))
                        
                        if results.right_hand_landmarks:
                            # Index finger tip landmark index is 8
                            right_landmark = results.right_hand_landmarks.landmark[8]
                            right_finger_pos = (int(right_landmark.x * w), int(right_landmark.y * h))
                        
                        # Add current positions to trace buffers
                        if left_finger_pos:
                            left_finger_trace.append(left_finger_pos)
                        if right_finger_pos:
                            right_finger_trace.append(right_finger_pos)
                        
                        # Keep trace buffers to specified length
                        if len(left_finger_trace) > trace_frames:
                            left_finger_trace.pop(0)
                        if len(right_finger_trace) > trace_frames:
                            right_finger_trace.pop(0)
                        
                        # Draw fading traces for left index finger
                        for i in range(len(left_finger_trace)-1):
                            if i < len(left_finger_trace) and (i+1) < len(left_finger_trace):
                                # Calculate opacity based on position in trace
                                opacity = int(255 * (i+1) / len(left_finger_trace))
                                alpha = opacity / 255.0
                                
                                # Create a copy for transparency effect
                                overlay = original_image.copy()
                                cv2.line(overlay, left_finger_trace[i], left_finger_trace[i+1], 
                                       trace_color_left, 2)
                                original_image = cv2.addWeighted(overlay, alpha, original_image, 1-alpha, 0)
                        
                        # Draw fading traces for right index finger
                        for i in range(len(right_finger_trace)-1):
                            if i < len(right_finger_trace) and (i+1) < len(right_finger_trace):
                                # Calculate opacity based on position in trace
                                opacity = int(255 * (i+1) / len(right_finger_trace))
                                alpha = opacity / 255.0
                                
                                # Create a copy for transparency effect
                                overlay = original_image.copy()
                                cv2.line(overlay, right_finger_trace[i], right_finger_trace[i+1], 
                                       trace_color_right, 2)
                                original_image = cv2.addWeighted(overlay, alpha, original_image, 1-alpha, 0)
                    
                    # Draw landmarks conditionally
                    if skeleton:
                        mp_drawing.draw_landmarks(original_image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
                        mp_drawing.draw_landmarks(original_image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
                        
                        mp_drawing.draw_landmarks(
                            original_image,
                            results.face_landmarks,
                            mp_holistic.FACEMESH_TESSELATION,
                            landmark_drawing_spec=None,
                            connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_tesselation_style()
                        )
                        mp_drawing.draw_landmarks(
                            original_image,
                            results.pose_landmarks,
                            mp_holistic.POSE_CONNECTIONS,
                            landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style()
                        )
                    elif skeleton_face_only:
                        # For face-only mode, draw only face landmarks
                        mp_drawing.draw_landmarks(
                            original_image,
                            results.face_landmarks,
                            mp_holistic.FACEMESH_TESSELATION,
                            landmark_drawing_spec=None,
                            connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_tesselation_style()
                        )
                    
                    # Save time series data - FIXED for hand swapping issue
                    samplebody = listpostions(results.pose_landmarks)
                    sampleface = listpostions(results.face_landmarks)
                    
                    # Process hands separately
                    sampleLH = listpostions(results.left_hand_landmarks)
                    sampleRH = listpostions(results.right_hand_landmarks)
                    
                    # Fill empty left hand with placeholders
                    if len(sampleLH) == 0:
                        sampleLH = ["" for x in range(int(len(markerxyzhands)/2))]
                    
                    # Combine hands
                    samplehands = sampleLH + sampleRH
                    
                    # Add time
                    samplebody.insert(0, time)
                    samplehands.insert(0, time)
                    sampleface.insert(0, time)
                    
                    # Append to time series
                    tsbody.append(samplebody)
                    tshands.append(samplehands)
                    tsface.append(sampleface)
                    
                else:
                    original_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                    # Add NaN data
                    samplebody = [np.nan for x in range(len(markerxyzbody)-1)]
                    samplehands = [np.nan for x in range(len(markerxyzhands)-1)]
                    sampleface = [np.nan for x in range(len(markerxyzface)-1)]
                    samplebody.insert(0, time)
                    samplehands.insert(0, time)
                    sampleface.insert(0, time)
                    tsbody.append(samplebody)
                    tshands.append(samplehands)
                    tsface.append(sampleface)
                
                cv2.imshow("resizedimage", original_image)
                out.write(original_image)
                time = time + (1000/samplerate)
                
            if cv2.waitKey(1) == 27:
                break
            if ret == False:
                break
    
    out.release()
    capture.release()
    cv2.destroyAllWindows()
    
    # Save CSV files
    filebody = open(outtputf_ts + vidf[:-4] + '_body.csv', 'w+', newline='')
    with filebody:
        write = csv.writer(filebody)
        write.writerows(tsbody)
    
    filehands = open(outtputf_ts + vidf[:-4] + '_hands.csv', 'w+', newline='')
    with filehands:
        write = csv.writer(filehands)
        write.writerows(tshands)
    
    fileface = open(outtputf_ts + vidf[:-4] + '_face.csv', 'w+', newline='')
    with fileface:
        write = csv.writer(fileface)
        write.writerows(tsface)

print("Done with processing all folders; go look in your output folders!") 
Processing video: ted_kid.mp4
Video 1 of 1
Done with processing all folders; go look in your output folders!

Example output (full body blur + skeleton + Tracing)¶

isolated